A new taxonomy of sublinear keyword pattern matching algorithms
نویسندگان
چکیده
This paper presents a new taxonomy of sublinear (multiple) keyword pattern matching algorithms. Based on an earlier taxonomy by Watson and Zwaan [WZ96, WZ95], this new taxonomy includes not only suffix-based algorithms related to the Boyer-Moore, CommentzWalter and Fan-Su algorithms, but factorand factor oracle-based algorithms such as Backward DAWG Matching and Backward Oracle Matching as well. In particular, we show how suffix-based (Commentz-Walter like), factorand factor oracle-based sublinear keyword pattern matching algorithms can all be seen as instantiations of a general sublinear algorithm skeleton. In addition, we show all shift functions defined for the suffix-based algorithms to be in principle reusable for factorand factor oracle-based algorithms. The taxonomy is based on deriving the algorithms from a common starting point by adding algorithm and problem details, in order to arrive at efficient or well-known algorithms. Such a presentation provides correctness arguments for the algorithms as well as clarity on how the algorithms are related to one another. In addition, it is helpful in the construction of a toolkit of the algorithms.
منابع مشابه
A Taxonomy of Sublinear Multiple Keyword Pattern Matching Algorithms
This paper presents a taxonomy of sublinear keyword pattern matching algorithms related to the Boyer-Moore algorithm [BM77) and the Commentz-Walter algorithm [CW79a, CW79b). The taxonomy includes, amongst others, the multiple keyword generalization of the single keyword Boyer-Moore algorithm and an algorithm by Fan and Su [FS93, FS94). The corresponding precomputatioD algorithms are pre~ented a...
متن کاملDeriving the Boyer-Moore-Horspool algorithm
The keyword pattern matching problem has been frequently studied, and many different algorithms for solving it have been suggested. Watson and Zwaan in the early 1990s derived a set of well-known solutions from a common starting point, leading to a taxonomy of such algorithms. Their taxonomy did not include a variant of the Boyer-Moore algorithm developed by Horspool. In this paper, I present t...
متن کاملOrder-Preserving Matching with Filtration
The problem of order-preserving matching has gained attention lately. The text and the pattern consist of numbers. The task is to find all substrings in the text which have the same relative order as the pattern. The problem has applications in analysis of time series like stock market or weather data. Solutions based on the KMP and BMH algorithms have been presented earlier. We present a new s...
متن کاملA Collection of New Regular Grammar Pattern Matching Algorithms
A number of new algorithms for regular grammar pattern matching is presented. The new algorithms handle patterns speci ed by regular grammars | a generalization of multiple keyword pattern matching and single keyword pattern matching, both considered extensively in and [14, Chapter 4] and in [18]. Among the algorithms is a Boyer-Moore type algorithm for regular grammar pattern matching, answeri...
متن کاملMultiple Keyword Pattern Matching using Position Encoded Pattern Lattices
Formal concept analysis is used as the basis for two new multiple keyword string pattern matching algorithms. The algorithms addressed are built upon a so-called position encoded pattern lattice (PEPL). The algorithms presented are in conceptual form only; no experimental results are given. The first algorithm to be presented is easily understood and relies directly on the PEPL for matching. It...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995